Sentence Boundary Detection in Turkish
نویسندگان
چکیده
In this paper, we describe a solution method for sentence boundary detection in Turkish. The method exploits simple heuristic knowledge of Turkish syllabication and its phonetic rules for disambiguation of dots. The test accuracy of the algorithm is measured as 96.02%. The main contribution of this study is considered as presenting a new lexicon free method for differentiating EOS (end of sentence) dots from the ones that are used for other purposes.
منابع مشابه
An Infrastructure for Turkish Prosody Generation in Text-to-Speech Synthesis
Text-to-speech engines benefit from natural language processing while generating the appropriate prosody. In this study, we investigate the natural language processing infrastructure for Turkish prosody generation in three steps as pronunciation disambiguation, phonological phrase detection and intonation level assignment. We focus on phrase boundary detection and intonation assignment. We prop...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملRule-Based Sentence Detection Method (RBSDM) for Turkish
The first process of generating a corpus, which is a representative of the language, is the determination of sentences, which is very complicated and hard to solve, but an important part of the corpus generation. Different approaches have been tried to find out sentence boundaries in some languages. In Turkish, the most known ways of determining sentence boundaries are using statistics and mach...
متن کاملTAG Analysis of Turkish Long Distance Dependencies
All permutations of a two level embedding sentence in Turkish is analyzed, in order to develop an LTAG grammar that can account for Turkish long distance dependencies. The fact that Turkish allows only long distance topicalization and extraposition is shown to be connected to a condition-the coherence condition-that draws the boundary between the acceptable and inacceptable permutations of the ...
متن کاملSpeech Communication Session 4pSCb: Production and Perception I: Beyond the Speech Segment (Poster Session) 4pSCb49. Towards a model of intonational phonology of Turkish: Neutral intonation
This study proposes an Autosegmental-Metrical model of Turkish intonation based on sentences produced in neutral focus, as part of our ongoing research investigating Turkish intonational phonology. Tonal patterns of utterances were examined by varying the length of a word and a phrase, the location of stress, syntactic structures, and sentence types. Preliminary results suggest that Turkish has...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004